The goal of maximum likelihood is to fit a distribution to some data.
Using the Bayes Theorem, we want to find the most likely value for the parametetrs of our model, given the data.
The likelyhood is equal to the probability density function of a gaussian, if we assume that the data was generated by a gaussian distribution.
We basically bruteforce fit a gaussian distribution on the data and then we get the one that maximizes the likelyhood function.
So we begin by fitting the gaussians, we start with
Which yields a likelyhood of:
But we can do better right?
In fact, if we plug in
We get a likelyhood of 0.12, which is definitely better!
By the way, if we plot the likelyhood hovering all over the possible values of
If we have multiple data points, the likelihood function will be the product of all the gaussians/individual likelyhood functions that are generated from the data points.
So we take the derivative of this shit with respect to
Obviously, you can do the same to find the standard deviations. You lock in the
In order to get the maximum likelihood parameters for multiple data points, we must multiply all the individual likelihood functions and take the derivative of that, solving for
There is a whole ass proof to justify that the maximum likelihood estimate is equal to the mean of the measurements. The proof is covered in the link below:
Similarly, there is a proof that shows that the width of the distribution is equal to the standard deviation of the measurements:
These notions may be obvious, but now we have the math to back it up.